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1 INTRODUCTION 



Abstract 

We introduce a new class of latent process models for dynamic relational network data with the goal 
of detecting time-dependent structure. Network data are often observed over time, and static network 
models for such data may fail to capture relevant dynamic features. We present a new technique for 
identifying the emergence or disappearance of distinct subpopulations of vertices. In this formulation, a 
network is observed over time, with attributed edges appearing at random times. At unknown time points, 
subgroups of vertices may exhibit a change in behavior. Such changes may take the form of a change 
in the overall probability of connection within or between subgroups, or a change in the distribution of 
edge attributes. A mixture distribution for latent vertex positions is used to detect heterogeneities in 
connectivity behavior over time and over vertices. The probability of edges with various attributes at a 
given time is modeled using a latent-space stochastic process associated with each vertex. A random dot 
product model is used to describe the dependency structure of the graph. As an application we analyze 
the Enron email corpus. 



1 Introduction 



Network data are often observed over time, with evolving connections between nodes. Our goal is to detect 
time- dependent structure in network data, finding change points at which the characteristics of the network 
are altered. We consider "streaming" network data, a system in which connections between n vertices are 
observed over a time period (0, T). At random times (ti, £2, • • • , £iv)> undirected connections between vertices 
are created, each of which may be associated with a categorical attribute contained in {1, 2, . . . , K}. In 
particular, we study connections which have no temporal duration, such as emails. At unknown time points, 
subgroups of vertices may exhibit a change in behavior. Such changes may take the form of a change in the 
overall probability of connection within or between subgroups, a change in the distribution of edge attributes, 
or both. Examples of emerging subpopulations with distinct behavior may include bursts of "chatter" activity 
among a group of individuals, or the creation of new communities of vertices such as parents of members of 
a little league team at the beginning of a new season. 

Real world networks, particularly social networks, generally exhibit structure beyond that which can be 
explained by simple stochastic models. Common features of network structure include heterogeneity across 
vertices with respect to number and types of connections observed, transitivity, and clusters of vertices which 
are more likely to be connected to one another. One approach to modeling network structure is to partition 
the graph, grouping together similar vertices. Similarity can be denned in various ways. Many analyses of 
network data focus on discovering community structure, in which groups of vertices that have a high degree of 



connection between them belong to the same community or block ( Girvan and Newman 2002 Airoldi et al 
2008). Vertices can also be classified according to structural equivalence. A group of structurally equivalent 



vertices have the same "role" in the network, exhibiting similar patterns of connectivity with other groups 
of vertices. Stochastic block models (Wang and Wong 1987 Snijders and Nowicki, 1997) parameterize 



block structure by assigning within and between-block probabilities of connection, and weighted membership 
vectors specifying to which blocks a vertex is likely to belong. 

Latent space models characterize vertex behavior by projecting each vertex onto a low-dimensional space. 
This approach is fruitful both in constructing analytically tractable models, and in creating a framework for 



comparison across vertices. HofT et al. (2002) propose a class of latent space models in which the probability 
of connection between two vertices depends on their respective latent positions through a Euclidian distance 
function. Vertices with neighboring latent positions are more likely to be connected than those with distant 
positions. After conditioning on latent vertex positions, the edges are independent of one another. A detailed 



review of statistical network modeling is available in Goldenberg et al. (2010). 



Models which define the probability of an edge for a given pair of vertices as a function of latent positions 
have essentially two possible formulations. The first is a general latent domain in which unconstrained 
functions of positions (such as distances) are transformed to probabilities using a link function; the second is 
a constrained latent space which produces valid edge probabilities without transformation via a link function. 

We take the latter approach, defining the latent space as the if-dimensional simplex and using the inner 
product between latent positions to define the probability of connection. This "random dot product model" 



for undirected connections is described by Scheinerman and Tucker (2010). The properties of this model 
with respect to random graph properties such as clustering, diameter and degree distributions are explored 
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Young and Scheinerman (2007). In a related model, HofT (2009) describes the probability of directed 



connections using a multiplicative latent factor model in which each vertex is endowed with unobserved 
sender and receiver characteristics. 

Graph partitions can be defined using the similarity measure induced by distances in latent space. |Hand- 



cock and Raftery (2007) apply a model-based clustering algorithm to latent positions in a Euclidean space 



to infer a latent cluster structure over vertices. Latent positions are assumed to be drawn from a Gaussian 
mixture, and the probability of connection varies with the pairwise distance between latent positions. The 
latent mixture model induces clustering in the observed network. 

We are interested in estimating partitions that identify dissimilarity in connectivity behavior across time 
as well as across vertices. While most network analysis assumes a static topology, dynamic network models 
have been proposed in contexts where data on the temporal location of connections is available. Dynamic 



mixed- membership stochastic blockmodels (Xing et al. 2010) are an extension of the mixed- membership 



stochastic blockmodel (Airoldi et al. 2008) in which the vectors identifying block membership (for each 
node) can change over time, such that each actor may take on various roles in a network at different times. 
Latent space models have also been generalized to include dynamic positions ( |Sarkar and Moore] 2005 



Westveld and Hoff, 2010 Xu and Zheng 2009) . We are particularly interested in dynamic position models 



which can be analyzed to detect change points at which there is a shift in network behavior. 

In many applications observed edges contain additional information on the type of connection between 
two vertices. We develop a novel treatment of the "attributed edge" case (sometimes called "colored" edges) 
in which each edge has a categorical attribute contained in a finite set, in addition to our model for the 
standard unattributed edge case. Attributes may be observed directly or, as in the case of the Enron email 
data, assigned by a classification procedure ( Priebe et al.||2010| ). For data with attributed edges, we use the 
additional information to construct a more complex description of vertex behavior, in which vertices vary 
with respect to the types of connections they are associated with as well as probabilities of connection. 



Lee and Priebe (2011) propose a dynamic dot product model for attributed graphs in which each vertex 



is associated with a latent stochastic process. Using a mathematically tractable approximate model, they 
develop a test statistic for detecting changes in attributed multigraphs observed at discrete times. The 
changes of interest are shifts in the behavior of an unknown anomalous subgroup of vertices. We approach a 
similar problem using a related latent position model, and extend the detection problem to multiple change 
points and groups of vertices. |Costa et aL (2007) investigate a related detection task in a different context, 
developing a scan statistic to detect spatio-temporal disease clusters. 

We propose a dynamic latent position model in which the presence and attributes of observable edges 
depend probabilistically on the latent positions of the associated vertices through the random dot product 
model. Through the inferred latent positions, we can identify dissimilarity in behavior across groups of 
vertices or through time. We introduce a mixture model in which latent positions are drawn from either a 
homogeneous or heterogeneous distribution with respect to vertices and time. The goal of the analysis is 
to detect change points at which there is an identifiable shift between heterogeneity and homogeneity (in 
either direction) in the network. We approach the possibility of multiple change points through an iterative 
partitioning procedure. 

Network data are complex. We aim to construct a simple model which can be used to detect time- varying 
structure in a variety of datasets. If more subtle features of the network are of interest, an initial evaluation of 
time-dependent block structure may be useful in segmenting the data before fitting a more elaborate model. 
Practicality of implementation is an important issue in network modeling, and we have found that our model 
can be fit relatively quickly for moderately large datasets. As an application we analyze the Enron email 
corpus. 



2 Model 

We introduce a generative model for streaming network data in which a population of n vertices are observed 
continuously in time, with edges appearing at random times, possibly with categorical attributes. The density 
of edges over time is of interest, as is the density of edges across the population of vertices. The distribution 
of edges over time is assumed to follow a doubly stochastic Poisson process model. The generative model is 
as follows: 



2 



2 MODEL 



1. Generate a point process (£1,^2, • • • ,t N +) on the interval (0,T) according to a simple Poisson process 
with rate A. 

2. For each j = 1,...,7V + : 

(i) Randomly choose a pair (uj,Vj) such that Uj ^ Vj from the population of n vertices. 

(ii) For vertices uj and Vj, generate latent positions X u .(tj) and X v .(tj) from distributions F(9 u .(tj)) 
and F(0 Vj (tj)) independently. The vertex-specific parameter processes v (t) are described in sec- 



tion ELU 

(iii) Unattributed edge case: generate a Bernoulli random variable Zj such that 

K 



p( Zj = 1) = x U] ( tj ) • x Vj ( tj ) = x% (tjH? (W. (i) 



k=l 



If Zj = 1, draw an edge between and 

Attributed edge case: Draw an element kj from the set 0, 1, 2, . . . , K such that 



■p(L, _ jL\ _ J X ^ (tj) X Vj\tj) if > 0, / 9 \ 



where Xu- (tj) and x^. (tj) are the /cth elements of vectors X u .(tj) and X Vj (tj), respectively. If 
kj > 0, draw an edge with attribute kj between Uj and Vj. 

The underlying poisson process (£1, £2 • • • ? £/v+) ^ (0?^) creates edge opportunities uniformly over the 
graph with exponentially distributed inter- arrival times. For simplicity, we consider data observed over a 
fixed interval (0,T), but a related model for ongoing data acquisition could be constructed in which each 
new edge opportunity occurs at an exponentially distributed interval after the last. Each pair of vertices 
has the same expected number of edge opportunities, and the expected number of edges opportunities does 
not change over time. The edge opportunity process is unobserved; what we observe are the realized edges, 
a filtered version of the edge opportunity process. Through this filtering, we can model inhomogeneities in 
number and attributes of edges over time and over the graph. A schematic for this two-level process is given 
in Figure 1. 

The probability an edge opportunity at time tj will be realized depends on the latent processes associated 
with vertices Uj and Vj. At each time t, Uj and Vj occupy positions X Uj (t) and X v .(t) in a latent space. We 
define the latent space S to be the subset of R K bounded by the if-dimensional simplex: 

K 

S = {xeR^ :^2x k <l}. (3) 

k=l 

The position of vertex v in the latent space at time £, X v (t), is generated from the distribution F(0 v (t)). In 
what follows, we take F to be the (K + l)-dimensional Dirichlet distribution. To allow for flexible modeling 
of edge probabilities, we take X v (t) to be the first K components of a (K + l)-dimensional Dirichlet random 
variable, i.e. if Y = (2/1, . . . , y^K+i)) ~ Dir(0 v (t)), X v (t) = (yi, . . . , yx)- Note that since Dirichlet random 
variables must sum to one, the (K + l) st component is fixed given the first K. 

The relative locations of the two vertices in latent space determine the probability that edges between 
them will be realized. The probability an edge between vertices u and v will be realized at time t depends 
on X u (t) and X v (t) through the dot product: 

K 

P( Zj = MX^t^X^tj)) = P{k 3 > 0|X Mj .fe),X^-)) = X Uj (tj) ■ X Vj (tj) = J^^^KHtj). (4) 

k=l 

The dot product captures inhomogeneities in the probability of connection across different vertex pairs. In 
particular, two types of inhomogeneities can be easily expressed through the dot product: differences in overall 
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Unobserved edge opportunity process 



t = t P 



■•• t = t 3 ... t = t 4 

Observed edge process 



t = t N 



t = t 1 



t = t 4 



Figure 1: An underlying unobserved Poisson process (top) generates opportunities for edges between pairs of 
vertices. The observed edge process (bottom) is a filtered version of the edge opportunity process in which 
edges may have categorical attributes. 



connectivity for an individual vertex, and clustering, or the tendency for some groups of vertices to be more 
likely to be connected with one another. The former can be expressed through variations in the magnitude 
of the latent position vector. As ||X v (t)|| approaches (i.e., the latent position approaches the origin), the 
dot product X v (t) ■ X u (t) will approach for any other vertex u. Similarly, as approaches 1, the 

probability of realized edges between v and all other vertices will increase. A low probability of connection 
can also be expressed through latent position vectors which are nearly orthogonal. 

The second type of inhomogeneity, clustering, can be expressed through the angles between vertices. For 
vertex pairs with a large angle between them (different directions in the latent space), the dot product will 
be small, even if the overall magnitude of each vector is large. When vertex pairs point in the same direction 
in the latent space, they will be more likely to communicate. 

Using the dot product model, clusters in the latent space will give rise to clusters in the graph, as nearby 
vertices, with respect to angle from the origin, are more likely to communicate with one another. We are 
therefore interested in analyzing the structure of the graph in terms of distributions of latent positions. 

The generative process described above produces a collection of events denoted e + in the unattributed 
edge case and a + in the attributed edge case. Each event has a time stamp tj, a vertex pair (uj,Vj), and 
either an edge indicator variable Zj or an attribute ky. 

e+ = {{t h u h v h z^3 = 1...N+} = (T+,U+,V+,Z+) (5) 

and 

a+ = { (tj ,u j ,v j ,k j );j = l...N+} = (T+,U + ,V + ,K+) (6) 

where T+ = {t jt j = 1 . . . N+}, U+ = { Uj ,j = 1 . . . N+}, V+ = { Vj ,j = 1 . . . N+}, Z+ = { Zj ,j = 1 . . . N+}, 
and K + = {kj,j — 1 . . . iV + }. The collections e + and a + contain events for which Zj = or kj = 0. Since 
no edges are created in these events, we do not observe them. We observe e or a , subsets of e + and a + for 
which edges are realized: 

e = {(tj,u h Vj) : Zj = 1} = {(U, m,Vi):i = l...N} = (T, U, V) (7) 

or 

SL={(t j ,u j ,v j ,k j ) :kj >0} = {(t i ,u i ,v i ,ki):i = l...N} = (T,U,V,K). (8) 
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As an example, consider e to be a record of email messages, each of which contains a sender /receiver pair 
and time label, and a to be a similar collection which also contains a categorical email topic attribute. 

Using the latent positions model, we are interested in performing inference on 1? . . . , n , the parameters 
describing the distributions of the latent positions. If V = 6 for v = 1 . . . n, each vertex pair will have the 
same expected number of edges. In the general case, if U = V for a particular pair (u, v) vertices u and v 
may be more likely to communicate, depending on their overall probability of edge creation. Given the latent 
positions, the probability of the complete collection of events can be factored as 

P(e+|X,A) = P(U + ,V + \N+)P(Z + \X,N+)P(T + ,N + \\) (9) 

(XT) N+ e~ XT 

j:Zj = l j-zj=0 

where C UjVi t is a constant which does not depend on X or 0. The first product term describes the N elements 
of e, i.e. the observed edges. The second term describes the subset of elements of e + which are unobserved, 
and therefore not in e. The number of edge opportunities 7V + and the latent vertex positions at unobserved 
events appear in the likelihood, but are unobserved. If we assume the Poisson process parameter A is given, as 
will be discussed in Section [3j we can work with the expectation of the likelihood 1(0 ^ e + ) given the observed 
data e : 

oo N (\rp\N+ p -\T 

E e+ (l(9,e+)\e) = C u , v , t X\X u M-XvM(X-E(X u .X v \0))( ^-WVlJ— (io) 

7V+=0 i=l 
N 

= C u , v , t l[XM ■ x vM) (1 - E(Xu ■ X v \0)r N e- {XT)E{X "- X " l9) , (11) 
1=1 

where the second equality is simply an identity of the exponential function. This allows us to fit the model 
based on the observed data e or a. 

2.1 Change Points 

We are interested in detecting changes in the behavior of the network, particularly changes in which a subset 
of vertices alter their behavior as a group. Where changes exist, we seek to partition the network over 
vertices and over time, detecting time-dependent inhomogeneities. Through an iterative procedure, it will 
ultimately be possible to detect multiple time points at which the network changes behavior, and multiple 
subpopulations of vertices exhibiting distinct behavior. As an initial step, we define a partition to identify 
unusual behavior in one subgroup of vertices over a single unknown time interval. A schematic is given in 
Figure 2. 

Recall that X v (t), the latent position of vertex v at time t, is drawn from a distribution F(9 v (t)) . Changes 
in behavior for vertex v through time are modeled through changes in 9 V . For the Dirichlet distribution, 
this is a vector a = (ai, . . which specifies both the center and spread of the latent positions. Let 

v = {^i, . . . , v m } be a subset of {1, . . . , n). Our initial hypotheses are 

#o : v (t) = Qo,v G {1, . . . n}, t e (0, T) (12) 
ffi : 9 v (t) = o ,ve{l,...,n},te (0, Ti ) U (r 2 , T), (13) 
v (t) = Q ,v e {l,...,n}/v,t e (ri,r 2 ) 

0v(t) = u vev,te(T 1 ,T 2 ) 

for some unknown 9o,0i,m, v = {^i, . . . , ^ m }, t\ and We approach the detection of multiple change points 
and multiple sub-populations of vertices in a hierarchical manner. An initial partition produces estimates 



of (i>i, . . . , Vm) and (ri,T2) from the alternative model in (13) using the EM algorithm, as described in the 



next section. If the initial partition is shown to describe the data significantly better than the null model, as 
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measured by the test statistic M n defined below, the model is fit for further partitions, which are then tested 
for significance. 



Vertex population 



time 



t = 



t = r, 



t = To 



t = T 



Latent vertex positions 






Figure 2: Over an unknown time interval (ti,T2), the population of vertices may contain distinct subpopu- 
lations with respect to latent positions in the i\~-dimensional simplex. In this case, K=2. 



We seek to identify the number of partitions which optimally describe the data by defining a stopping 
rule to decide when further partitions are unnecessary. The hierarchical approach is used rather than si- 
multaneously estimating multiple partitions over vertices and time because the latter is computationally 
impractical. 

At each partitioning stage, the significance of the current partition can be phrased in terms of the number 
of components in a mixture distribution. If there are no distinct subgroups in a given set of vertices, and 
behavior does not change over a given time interval, the latent positions can be modeled using one component 
distribution, i.e. X v (t) ~ F(6),v G {l,...n},£ G (0,T). If the distribution of the X v (t) differs over time 
or over vertices, then a mixture model with two or more components describes the distribution of the latent 
positions. 

Determining the correct number of components in a mixture distribution is a difficult and well-studied 
problem. In this context, the likelihood ratio has been shown (jChernoff and Lander, 1995 Sen and Ghosh 



1985|)) to be badly behaved, without a simple limiting distribution. We make use of some recent work in 



testing for mixture problems ( |Chen et al. , 2004), employing a modified likelihood ratio test statistic to assess 
the increase in likelihood associated with each additional component. Because the model at hand is more 



complex than a standard mixture model and the applicability of the asymptotic results in Chen et al. (2004) 



is unclear, we use a simulation-based threshold in deciding to accept or reject each new partition. 

The modified version of the likelihood ratio test employs a penalty function to control the behavior of the 
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test statistic near the boundary of the parameter space. Their test statistic is given by 

M n = 2{l(§ , 0i ) - 1(0) + Clog(4 7 (l - 7 ))}, (14) 

where 7 is a mixing probability and C is a tuning parameter. We use M n to assess the significance of each 
new partition. 

Under certain regularity conditions, this test statistic has been shown asymptotically to have a mixture 
of x 2 as its limiting null distribution: 

M n ~ \xl + \xl (15) 
We compare this asymptotic approximation to a simulation-based threshold in Section [4] 



3 Model Fitting 

A stochastic conditional EM algorithm (iMeng and Rubin, 1993) is used to fit the model under heterogene- 



ity. The E-step at the (j + l) st iteration of the algorithm consists of computing (fi, T2) J+1 , a stochastic 
approximation to the conditional expectation of (ti,T2) given the current parameter estimates #q, 0{, and 
v- 7 (the current estimate of v = {v\, . . . v m }), and an updated estimate v J+1 of v. The M-step requires 
finding the updated maximum likelihood estimates and given the current estimates of ti, T2 and v. 
The non-standard aspects of the fitting algorithm are described in this section, with computational details 
available in the appendix. 

The success of the algorithm is dependent in part on the initial values used for the model parameters. 
Based on simulation results, the sensitivity of the algorithm to initial conditions seems to vary with the number 
of change points and of distinct subgroups of vertices, with more components increasing the sensitivity. An 
approach to producing initial values that we have found to work well is to divide the entire observed time 
interval (0, T) into segments, find candidate values of v for each segment, and then assess which combination 
of time interval and candidate v produces the highest likelihood under the 2-component mixture model. 

The details of the initial value procedure are as follows. The time interval (0, T) is divided into r equally 
sized segments, producing non-overlapping time intervals (0, T/r), (T/r, 2T/r), . . . ((r — l)T/r, T). In the 

— 1 — r 

unattributed edge case, the least squares estimates of the latent positions over each interval X , . . . , X are 
computed from the data in each time interval using the SVD procedure described below. This group of 
n latent positions are then clustered into two groups using k-means, producing an estimate of v for each 
interval. In the attributed edge case, using the data from each interval, a vector (pi, . . .px) is created for 
each vertex, where pk is the proportion of edges with attribute k. These vectors are then clustered using 
c-means, producing estimates of v for each interval. 

The E-step at the (j + l) th iteration of the algorithm consists of computing a stochastic approximation 
to the conditional expectation of (ti,T2) given the current parameter estimates J 0: J 1: and {vi, . . . v m } J and 
the current estimates of v. The approximate expectation is produced by simulating a large number (5,000, in 
our case) of candidate intervals {(ti^, £2,3) : 2 = 1, . . . 5000} from the uniform distribution over (0, T) x (0, T) 
and computing 

v(t^MA^ 3 AA) = y—z — — — (16) 

^ J2 z =i P(e\ti,z,t 2 ,z,v3 , 6 J , 6{) 

for each interval, where p(e\ . . .) is calculated using ( [27] ) or ( [39] ) . The collection of intervals and their associated 
conditional probabilities are then use to compute the approximate expectation (fi, T2) J ' +1 . 

The new estimate v J+1 is then computed using the updated estimate (fi, f^)- 7 '" 1 " 1 . For i = 1, . . . , n, we 
estimate the probability that vertex i is in the anomalous group v over the interval (fi, f^)- 7 '" 1 " 1 given the 
current parameter estimates and the current estimates of the other members of the anomalous group. We 
will assign vertex i to the updated estimate v J+1 if 

P (i e v\e,(f 1 ,7 i y +1 ,e 3 ,e{,^) > £ (17) 



for some threshold £, where we use £ = .5. The condition can be evaluated via Bayes rule and (27) or (139 
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Estimates and given the current estimates of Ti,T2 and v are computed in the M-step of the 
algorithm. This is somewhat complex as it involves integrating over the unobserved latent positions as 



well as dealing with non-identifiability issues. As in other latent position models ( jHandcock and Raftery 
2007 Gormley and Murphy, 2010| ), the likelihood in the unattributed edge dot product model is invariant to 



reflections and rotations of the latent positions (although it is not invariant to translations.) The likelihood is 
convex with respect to the dot product of the latent positions, but not necessarily with respect to the latent 
positions themselves. In the attributed edge case, the likelihood is not invariant to rotations and reflections, 
because the directions in the latent space have meaning with respect to the probabilities of various edge 



attributes. Handcock and Raftery (2007) resolve the non-identiflability in the latent distance model by post- 
processing the MCMC output to find the configuration of latent positions which is optimal in terms of Bayes 
risk. 

In order to resolve the non-identiflability for the unattributed random dot product model, we define an 
additional condition based on minimizing the squared difference between a modified adjacency matrix and a 
matrix consisting of the expected number of edges between each pair. 

Let X v be the average latent position of vertex v over a given time interval (ti,^)- The collection of 
latent positions over all vertices is the K x n matrix X = (X x , . . . X n ). The expected number of edges over 

lent of the 



(£1,^2) between vertices u and v is the (^,v) th element of the matrix 6X X, where 



(18) 



Let A be the multiadjacency matrix consisting of the number of edges between each vertex pair over the 
interval (ti,^)- We would like to compute an initial estimate of the latent positions by minimizing the 
difference between the observed and expected number of edges for each pair, given the latent positions. We 

— T — 

cannot directly compare A and £?X X, however, because the diagonal entries in the two ma trices are not 



comparable. We in stead use A, a modification of A with an augmented diagonal as described in|Scheinerman 



and Tucker 



(2010). Using singular value decomposition, we can easily find the latent positions X which 



minimize 



\\bX T X-A\\ 2 F , (19) 

the squared Frobenious norm of the difference between expected and observed edges. This least squares 
estimate of the latent positions is used as a starting point to a hill-climbing procedure to find the local 
maximum likelihood estimates of #0 an d Q\ which are closest to the minimizer of the least-squares condition. 
The computational details of the fitting algorithm are given in the appendix. 

In our model, there is non-identifiability between the Poisson process rate A and the parameters of the 
latent position distribution, which can be easily resolved. The expected number edges created in a given 
time interval increases with both A and the expected value of X u • X v ; increasing A produces more edge 
opportunities, and increasing E(X U • X v ) increases the probability of an edge at each opportunity. We 
approach this non-identifiability by fixing A to be 1.5 times the maximum number of edges observed in any 
time unit (weeks, in our application). Thus the expected number of edge opportunities (not necessarily edges) 
in a unit time interval is 50% greater than the maximum observed number. 



4 Simulations 

We simulate network datasets a = {ai, . . . , a/v} using the generative model in section[2]to test the performance 
of the partitioning algorithm. At random times ti, . . . , t N +, pairs of latent positions 

{(X Ul , X Vl ),..., (X Un+ , X v + )} are drawn from the (K + 1)- dimensional Dirichlet distribution. Change 
points are inserted at times t\ and T2, between which a subset of vertices {^i, . . .v m } G {1, . . . , n} has latent 
positions drawn from a Dirichlet distribution with parameter ot2 rather than ol\. 

The ability to detect time periods with heterogeneous subgroups increases with 0, the angle between ol\ 
and o<2, X = N/(n(n — l)/2), the average number of edges per pair, T2 — n, the length of the anomalous 
interval, and with m, the size of the anomalous subgroup (up to m = n/2). The average number of edges 
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Figure 3: Simulation results for the change point detection algorithm. The figures show sensitivity in detecting 
anomalous vertices (left), specificity (center), and average change point detection error (right). Results based 
on 100 simulations are shown for n=50 (top row) and n=150 (bottom row). 



per pair is a function of the length of the total time interval, the number of vertices n, the Poisson process 
rate A and the values of ql\ and 0^2- Performance can be measured by the correct identification of vertices 
as anomalous and non-anomalous, as represented by sensitivity (correct identification of anomalous vertices) 
and specificity (correct identification of non-anomalous vertices), and by (|ri — fi| + |t2 — T2D/2, the average 
error in estimates of the change points t\ and T2, when they are detected. Because of the invariance discussed 
in section [3J recovery of ol\ and 012 is not considered. 

Simulated data from networks of n = 50 an n = 150 vertices with anomalous subsets of size m = 10, 
K = 4 and various values of A and (j) are shown in Figure 3. A total time interval of length 100 is used 
with Poisson process rates 500, 350, 250, 150 and 75. The length of the anomalous interval is 40% of the 
total observed interval. In simulated data with moderate angles between the centroids of the latent position 
distributions, (</> = 76°) we find good performance in datasets with N > 2. For larger angles between cluster 
centroids, good performance can be acheived for N = 1. 



5 Application to Enron Data 



As an application, we analyze the Enron email corpus ( |Cohen 2009). Following its investigation of Enron, 
the Federal Energy Regulatory Commission publicly released emails sent between 1998 and 2002 among 
184 email addresses belonging to roughly 150 Enron employees, including high-level executives. We a use 



version of the dataset which has been processed to correct some integrity problems (Priebe et al. 2005). 



Previous analyses (Fu et al. , 2009 Diesner et al. , 2005 Priebe et al. 2005) have found these data to exhibit 



heterogeneity over time, and to exhibit structure across vertices. We use Michael Berry's topic classifications 
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for the 2001 data (Berry et al. , 2007), which assigns one of 32 topics to most messages. Using the assigned 
topics as edge attributes, we apply our algorithm to study time-dependent structure in the dataset over the 
calendar year 2001. 



Fu et al. (2009) have analyzed these data using the dynamic mixed- membership stochastic block model. 
Using information on the direction of the connections, the blockmodel provides a useful categorization of 
email behavior. Fu et al. identify, for example, groups of employees who tend to send and receive emails 
among themselves as cliques, groups that receive emails but rarely send them, etc. Their analysis of the 
changing membership vectors associated with individual email addresses reveals changes in the frequency of 
emails in late 2001 as the company was going bankrupt, and a decrease in emails sent by high-level executives 
over the same period. 

Analysis of the Enron data using the attributed edge dot product model reveals multiple change points. 
We restrict our attention to data on emails which have been assigned one of the 4 most popular email topics. 
The average number of connections between pairs is 1.1. Although this is a subset of the entire email corpus, 
and likely contains some residual integrity problems, we find it to be an informative application for our 
partitioning algorithm. 

In the attributed model, the dominant change point detected (that which is identified in the first partition) 
is in mid- August 2001, after which the detected heterogeneity persists until mid-December 2001. This time 
interval is significant with respect to major events related to the collapse of the company. CEO Jeffrey 
Skilling resigned on August 15, 2001 and over the following weeks the Enron stock price decline sharply and 
concerns over the company's accounting practices became public. Enron declared bankruptcy in December 
2001. The overall message rate increased somewhat in this period with respect to the January - August 
period (423 vs 297 messages per week, on average.) However, the graph partitioning algorithm identifies 
a subgroup of 36 email addresses for which communication decreases significantly. There is also a shift in 
distribution of message subjects (edge attributes). A second anomalous time period is detected between mid 
April and late July. Here, a group of 88 email addresses show a greatly increased communication rate and 



a change in email attributes. These results are consistent with the "chatter anomalies" detected in Priebe 
letaE] ( [2005] ) using scan statistics on an unattributed version of the same data. 

6 Discussion 

We have presented a novel algorithm and associated model for time-dependent network partitioning capable 
of discovering dynamic network structure. An important feature of this model is the inclusion of potential 
edge attributes, facilitating a more rich description of datasets such as the Enron email corpus. The filtered 
Poisson process model is a flexible framework for describing inhomogeneities in the rate of edge creation over 
time and over the graph. The detection of meaningful partitions is approached by fitting a mixture model 
and comparing it to a model with a homogeneous distribution of latent positions. 

The model fitting algorithm simultaneously estimates latent positions and cluster structure, which has 



been shown in related contexts (Handcock and Raftery 2007) to give better results than a two-stage procedure 



(position estimation followed by clustering). We approach the detection of multiple change points and multiple 
vertex subpopulations by iterating the partitioning algorithm. Iterative partitioning may not perform as well 
as a simultaneous estimation of multiple components, but we find the latter to be computationally infeasible. 
In order to test each partition for significance, we use the modified likelihood ratio test statistic proposed by 
Chen et ( |2QQ4] ) . Fitting the random dot product mixture model presents some computational challenges. 



Like the latent-distance model (Hoff et al. , 2002), the likelihood surface described by the random dot product 
model is non-convex, and the latent positions are non- unique. We impose an additional condition based 
on the least-squares optimal latent positions in order to define a unique maximum likelihood solution. We 
find this to be a fast and reliable solution to the non-uniqueness problem, although the precise relationship 
between the least-squares optimal solution estimates and the maxima of the likelihood surface is unclear. 

| Joseph and Wolfson| (|1993) discuss the consistency of maximum likelihood estimates in the "multi-path" 
change point problem, in which a collection of time series are observed, each potentially containing a unique 
change point. The locations of change points in individual time series are assumed to be independent of 
one another. They find that under certain regularity conditions, the maximum likelihood estimates of the 
change point locations and of the pre- and post-change parameters are consistent estimators. In our context, 
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each vertex has an associated time course with a possible change point. In contrast to the case described by 



Joseph and Wolfson (1993), the change points of individual vertex time courses are not independent of one 
another; this non-independence makes analytic results on the asymptotic behavior of the parameter estimates 
difficult. 

The proposed model for edge creation has a simple functional form, and while it is able to capture features 
frequently observed in real network data, it may not incorporate all available information. Several extensions 
of the model are possible. In particular, we have allowed for the parameters distributions of latent positions 
to change in time, but have not incorporated temporal dependence in the latent processes associated with 



each vertex. Lee and Priebe (2011| have analyzed the performance of inferences based on first and second 



order approximations to time- dependent latent dot product models. It may be possible to (for example) 
fit an autoregressive component to the proposed latent process model by updating latent position estimates 
with each new edge event, conditional on the estimated positions at the last edge event. For data in which 
the overall rate of edges is known to change over time, the model could be modified in a simple manner by 
specifying a non-constant Poisson process rate A. Under such conditions, the partitioning algorithm would 
detect changes in network behavior with respect to the overall dynamic message rate. In some applications, 
for example the Enron email data, vertex-level covariates and directional information are also available. 
A model which includes covariate information could be used in place of the simple dot product model. In 
general, the filtered Poisson model and change-point framework described above may be useful in combination 
with other latent position models for network data. 

A Appendix 

Here, we present more detailed information on the fitting algorithm for both the attributed and unattributed 
edge cases. 

A.l Attributed edge case 

Let yi and z^ i = 1, . . . N be indicators of whether the vertices associated with each edge event are contained 
in the set v, and Si, i = 1, . . . N be indicators of which edges occur during the interval (n, T2): 



_ 1 1 if ui G v, _ 1 j. 11 ui kz v, . _ , v 

Vi ~ 1 otherwise ' * ~ 1 n ^w^_ z - ±, - - - iv, ^uj 



f 1 if Vi G v, 

I otherwise, 

1 if U £ (ti,t 2 ), 

otherwise, 



At each edge event ai = ki), the associated latent positions X u . and X v . are each drawn from either 

f{oL\) or /(ao), where f(a) is the Dirichlet distribution, depending on the values of Ti,T2, and v. We can 
classify each edge event according to whether 0, 1 or 2 of the associated vertices are drawn from f(a\) rather 
than /(ao). Let Nq : Ni, and N2 be vectors of length K containing the frequency of edge attributes for for 
events containing 0, 1, or 2 latent positions drawn from f(ai). The kih respective elements of these vectors 
are 

N 

N , k = ^((1 - + Sl (l - 2/0(1 - Zi)) * l* 4 =fc, (21) 

2=1 

N 

N lyk = - Zi) + SiZi(l - yi)) * lfc,=fc, (22) 



i=l 

N 



N 2 ,k = ^2siyiZi*l ki=k , (23) 



i=l 



fc = !,...#. (24) 
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A APPENDIX 



Let 

K K K 



n q = y. n ^ n x = y, n ^ n 2 = y, n ^ ( 25 ) 

k=l k=l k=l 

and 

/n — rn\ /rn\ 

7o = A(T-(t 2 -ti)i^), 72 = X(t2-h)^j. 71 = AT -70 -72, (26) 

where m is the number of elements in the set v. The E-step requires computing the likelihood of the 
data given the model parameters as a function of the Dirichlet parameters ao = (a<o,i, • • • , ct$,K+i)' and 
ai — (ai,i, . • • , cti^K+i)' after integrating over the unobserved latent positions: 

N K n f at \ 



p(eKa 1 ,r 1 ,r 2 ,v)ocn _ ' J (l - £ (^) J ( 27 ) 

i=l iyi 1 * /c=l 



where 

= ao^, ^i = (28) 

fc=l /c=l 

(29) 

The CM-step of the algorithm is to compute updated estimates a^ +1 and based on the updated estimates 
of ff +1 , fj +1 and vJ +1 and the estimates a J and from the last iteration. Let s^,^, and Z{ be defined as 
in (|o| using f/ +1 ,f| +1 and v^' +1 . Then 



i+ i _ «o{((7i^i, fc M) 2 + 87o(^o,fe + ^i,fc)) 1/2 - Ti^UMI 

270 



«o = ^ > fe = l,...,if, (30) 



i • /(Ef-i«oV) 1/2 ^ • A 
<*o^+i =«o( ^gf^ X^o^ 1 )' ( 31 ) 

To" fe=1 

_ 5j{((7i< fc A%) 2 + 872(AT 2 , fc + iVi.fc)) 1 / 2 - 7i< fc M} 

E K « J+1 ) 1/2 K 
^i^+i = «i ( — S ^) • ( 33 ) 

7i 

A. 2 Unattributed edge case 



k=i 
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Let Si , yi , Zi , 70 , 71 and 72 be defined as above and 

iV 



A^o = ^(1 - *) + 5,(1 - *)(1 - j/i), (34) 

i=l 
N 

N i = ^2 SiVi ( 1 ~ + s * z i0- ~ y*)i ( 35 ) 

i=l 

N 

N 2 = ^2siyiZi. (36) 



The E-step requires the expected likelihood after integrating out the latent positions. Let 770 and rji be 
the first K elements of ao and a\ respectively: 

Vo = (a ,i, • • .,Oio iK )' (37) 
m = ( a i,i> • • • (38) 

The E step estimates for t\ , r 2 and v can then be found using 

/, \ TT Vsivi ' Vsiyi rjo - rjo\h°- N o) / rj • 771 \ (71-^1) / 771 • 771 \ (72-^2) 
p(eai,a 2 ,Ti,T 2 ,v) ocll _^ _ ( 1 =3— 1 - 1 =3— •• (39) 

Let A be the n x n multiadjacenty matrix associated with e. A can be decomposed into matrices A and 
A such that A = A -\- A where 

N 

A u , v = ^2 1 n i =u,^^((l - Si) + Si(l - - Zi)) (40) 

= A uv . (41) 

and 

The CM-step estimate a J+1 is based on estimates of X, the K x n matrix of latent positions, computed 
via an iterated algorithm based on gradient ascent. The starting point of the algorithm is the least squares 
estimate of X from the SVD procedure described in Section [3] From this, the MLEs of ao and a\ are com- 
puted. At each iteration of the algorithm, the new estimates of X^, i = 1 . . . n are then updated sequentially 
in the direction of the local maxima along the gradient of g(X) = p(X\e, ao, ai, Ti, T2, v): 

for i e v^' +1 

AGKX,)) = « T * * T + (vl - 1)/X?)/N, 

for i e {l,...,n}/V' +1 

A(g(X z )) = {P? * X T + (r/ T - 1)/Xf )/iV, (42) 

where and Pi are the ith columns of the matrices P = Ai/(X T X) and P = A^/(X T X), respectively, and 
" / " denotes elementwise division. After each update of X, the MLEs of ao and a\ are recomputed using 
the relevant latent positions, i.e. those corresponding to draws from /(ao) and /(ai), based on the current 
estimates of ti,T2 and v. 
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